Skip to main content

Expected Values

The Expected Value of a Random Variable XX is the sum of the product of each possible value1 of XX times the Probability of seeing that value. In the discrete case (and you'll see all these as ways of notating the same idea),


E[X]=βˆ‘xP(X=x)=βˆ‘xP(x)=βˆ‘xfX(x)=βˆ‘xf(x)\begin{align*} E[X] &= \sum{xP(X=x)} \\ &= \sum{xP(x)} \\ &= \sum{xf_X(x)} \\ &= \sum{xf(x)} \end{align*}

Where fX(x)f_X(x) is the probability distribution2 of XX. You'd use the fancy ∫\int in place of the humble βˆ‘\sum above in the continous case.

Linearity and Constancy​

The Expected Value of a number 42 is... drumroll... 42. This may seem silly to state but there's a Linearity property that's pretty important. If XX is a Random Variable and aa and bb are some constants,

E[a+bX]=E[a]+E[bX]=a+bE[X]E[a + bX] = E[a] + E[bX] = a + bE[X]

This is a Very Nice Thing to use in proofs and computations. aa shifts and bb scales E[X]E[X].

Important Things​

E[E[X]]=E[X]E[E[X]] = E[X]. May seem obvious to a lot of people but not to yours truly because I overthink things. E[X]E[X] has been computed and is not a Random Variable!

What is the Expected Value of a Dice Roll?
3.5
OK what is the Expected Value of the Expected Value of a Dice Roll?
We just did that. Still 3.5, the Expected Value of a Dice Roll... are you okay?

β–‘\square

Now this one's a doozy: if YY is another Random Variable, E[E[Y∣X]]=E[Y]E[E[Y|X]] = E[Y]. How can that be?

Consider this: What is the Expected Value of height HH in 145 people you picked at random and where all heights are equally likely?

E[H]=βˆ‘i=1145hiβ‹…P(H=hi)=βˆ‘i=1145hiβ‹…1145=1145βˆ‘hi\begin{align*} E[H] &= \sum_{i=1}^{145}{h_i} \cdot P(H=h_i) \\ &= \sum_{i=1}^{145}{h_i} \cdot \frac{1}{145} \\ &= \frac{1}{145} \sum{h_i} \end{align*}

Now you ask: What is the Expected Value of the height given a Random Variable Sex, S∈{Male,Female,Intersex}S \in \{\text{Male}, \text{Female}, \text{Intersex}\}? This is a simple conditional probability. Remembering the Expected Value is the sum of products of realizations and their probabilities,

E[H∣S]=βˆ‘hβ‹…P(H=h∣S)E[H|S] = \sum{h \cdot P(H=h|S)}

Now E[H∣S]E[H|S] is still a random variable because we haven't specified a value for SS (i.e., we haven't 'collapsed' it to a specific thing like E[H∣S=Female]E[H|S=\text{Female}]). So once again, remembering that Expected Value is the sum of products of all values of SS and their probabilities,

E[E[H∣S]]=βˆ‘s∈{M,F,I}[βˆ‘hβ‹…P(H=h∣S=s)]=βˆ‘hβ‹…P(H=h∣S=Male)+βˆ‘hβ‹…P(H=h∣S=Female)+βˆ‘hβ‹…P(H=h∣S=Intersex)\begin{align*} E[E[H|S]] &= \sum_{s\in{\{M,F,I\}}}\left[{\sum{h\cdot P(H=h|S=s)}}\right] \\ &= \sum{h \cdot P(H=h|S=\text{Male})} \\ &+ \sum{h \cdot P(H=h|S=\text{Female})} \\ &+ \sum{h \cdot P(H=h|S=\text{Intersex})} \end{align*}

So you're getting the Expected Value of the height across everyone in SS, which is simply E[H]E[H] πŸ₯³ This is very nice when we get to Variance and Covariance!

Variance and Covariance​

The Variance of a Random Variable XX is how much we expect it to deviate from its Expected Value and is a Random Variable itself3. We square it first because we want a nice positive number.

Var(X)=E[(Xβˆ’E[X])2]=E[X2βˆ’2β‹…Xβ‹…E[X]+(E[X])2]=E[X2]βˆ’E[2β‹…Xβ‹…E[X]]+E[(E[X])2]=E[X2]βˆ’2β‹…E[X]β‹…E[E[X]]+(E[X])2=E[X2]βˆ’2β‹…E[X]β‹…E[X]+(E[X])2=E[X2]βˆ’2β‹…(E[X])2+(E[X])2=E[X2]βˆ’(E[X])2\begin{align*} \text{Var(X)} &= E[(X - E[X])^2] \\ &= E[X^2 - 2 \cdot X \cdot E[X] + (E[X])^2] \\ &= E[X^2] - E[2 \cdot X \cdot E[X]] + E[(E[X])^2] \\ &= E[X^2] - 2 \cdot E[X] \cdot E[E[X]] + (E[X])^2 \\ &= E[X^2] - 2 \cdot E[X] \cdot E[X] + (E[X])^2 \\ &= E[X^2] - 2 \cdot (E[X])^2 + (E[X])^2 \\ &= E[X^2] - (E[X])^2 \\ \end{align*}

The Law of the Unconscious Statistician (LOTUS)​

It's simple (and highly useful) enough in practice. If g(X)g(X) is some function of Random Variable XX,

E[g(X)]=βˆ‘[g(x).fX(x)]E[g(X)] = \sum[g(x).f_X(x)]

I do not completely understand the proof but here it is for reference.

Footnotes​

  1. 'Realizations' is the fancy word. Denoted by xx, lowercase. ↩

  2. I've seen f(x)f(x) too as a shorthand. Note that FX(x)F_X(x) and F(x)F(x) refer to the Cumulative Density Function. ↩

  3. Any function of a Random Variable is a Random Variable. ↩